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ABSTBICT 

In a prograa of Indiiridually Prescribed Instruction 
(IPI) 0 nhere a student *s progress through each level of a program of 
study is governed by bis perforaance on a test dealiAg vith 
individual behavioral objectives, there is considerable value in 
keeping the nueber of iteis on each test at a ainisuH. The specified 
test length for each abjective Must, however, be adequate to provide 
sufficient inforaation, regarding the student •s degree cf aastery of 
the behavioral objective being tested. Just vhat the ainiaua 
acceptable length vill be depends on the aanner in vhich test 
inforcation is used to aake decisions about individual students, the 
level of functioning required for defining aastery -of an objective, 
the relative losses incurred in aaking^ false positive and false 
negative decisions, the background infora&tion available on the 
student an4^ on the instructional process, and the preaiua on testing 
tiae within^ the instructional process. Soae broad; guidelines 
regarding test length of iPI posttests are included. A nuaber of 
tables present data on the probability of the students achieving 
aastery level. (Author/HV) 
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Prescribing Test Length For Criterion-Referenced Measurement* 

* I. Posttests 



by 

Melvin R. Novick and Charles Lewis 

The American College Testing Program The University of Illinois 

and 

The University of Iowa 

Introduction 

In a program of Individually Prescribed Instruction (IPX), where a 
student's progress through each level of a program of stuJy is governed by 
his performance on a test dealing with individual behavioral objectives, 
there is considerable value in keeping the number of items on each test 
at a minimum. The specified test length for each Objective must, however, 
be adequate to provide sufficient information regarding the student's degree 
of mastery of the behavioral objective being tested. Just what the minimum 
acceptable length will be depends on the manner in which test information 
is used to make decisions about individual students, the level of 
functioning required for defining mastery of an objective, the relative losses 
iiicurred in making false positive and false negative decisions, the background 
information available on the student and on the instructional process, and 
the premium on testing time within the instructional process. Our purpose in 
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this paper is to discuss these issues and provide some broad guidelines for 
test-length specification for IPI posttests. These specifications will be 
tentative because of unresolved substantive and methodological issues, but 
we'beli^ave that they should provide some improvement on current practice. 
A separate, and rather more complex treatment will be required for placement 
and pretest length specification. 

Background 

In a criterion-referenced measurement approach to Individually 
Prescribed Instruction, we imagine a population of test items, having mixed 
item difficulty, dealing with a particular objective and an ideal decision 
which advances a student past this ob'jective if he is abl^, to answer at least 
a given percentage of the items in the population. This minimum passing 
percentage, the so-called criterion level , simply reflects the degree of 
mastery deemed sufficient for this objective (although it Implicitly involves 
the difficulty of the items as well). The actual percentage^ of items that 
a person would answer correctly in the population of items is, called his 
level of functioning . .In practice, the advancement -retention decision muflt 
be tnade from a small sample of observations (test items), and, hence, errors 
in the decision process must be expected. 

One common treatment of the test length problem in a criterion- 
referenced measurement context has been given by Millman (1972). He 
studied a standard decision rule wh^ch advances the student if the 
percent of items correctly answered on a test equals or exceeds the 
required criterion level. Here it is assumed that the items on the test 
may be treated as a random sample _from^the_po£u3^i^^^ 
that the obtained percentage correct is a useful estimate of thz true 
population percentage for the student. Using binomial probability 
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Table 1 

Percent of Students Expected To Be Incorrectly 
Advanced or Retained 



Specified Criterion Level ,70 



Student's True Level of Functioning* 



ran cement 


Nc. of 














80 


85 


90 


95 


Score 


Test Items 


50 


55 


60 


65 1 


70 


75 


6 


7 


6 


10 


16 


23 1 


67 


55 


A2 


28 


15 


A 


6 


8 


'15 


22 


32 


A3 


A5 


32 


20 


11 


A 


1 


7 
7 
8 


9 


9 


15 


23 


3A 


5A 


AO 


26 


lA 


5 


1 


10 


17 


.27 


38 


51 


35 


22 


12 


5 


1 




11 


11 


19 


30 


- A3 


. A3 


29 


16 


7 


.2 




9 

10 


12 


7 


13 


23 


35 


51 


35 


20 


9 


3 ' 




13 


5 


9 


17 


23 


58 


A2 


25 


12 


3 




11 


14 


3 


6 


12 


22 


-6A 


A8 


30 


15 


A 




12 


15 


. 2 


A 


9 


17 


70 


5A 


35 


18 


6 





S£eclfle d Criterion Level .75 

Student's True Level of Functioning* 



Advancement 
Score 


No. 6f 
Test Items'' 


50 


55 


60 


65 


70 1 


75 


80 


85 


90 


95 


6 
7 


8 


15 


22 


32 


A3 


55 1 


32 


20 


-11 


A 


1 


9 


9 


' 15 


23 


3A 


A6 


AO 


26 


lA 


5 


1 


8 
9 
9 

16 . 

17 

18 


10 


6 


10 


17 


26 


38 


A7 


32 


18 


7 


1 ' 


11 


3 


7 


12 


20 


.31 


55 


38 


22 


9 


2 


12 


7 


13 


23 


35 


•A9 


35 


20 


9 


3 




20 


1 


2 


5 


12 


2A 


58 


37 


17 


A 




21 




1 


A 


9 


20 


63 


Al 


20 


■5 




22 




1 


3 


7 


17 


68 


A6 


23 


6 
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Table 1 (continued) 
Specified Criterion Level >80 



/ Student's True Level of Functioning* 



Advancement 
Score 


No. of 

Test Items 


50 


55 


60 


65 


70 


75 


80 


85 


90 


95 


0 


7 


0 


1 A 
lU 


lo 


23 


33 


A5 


A2 


28 


15 


/ 

4 


7 


8 


A 


7 


11 


17 


26 


37 


50 


3A 


19 


6 


8 ' 


9 


2 


A- 


7 


12 


20 


30 


56 


AO 


23 


7 


8 


10 


6 


10 


17 


26 


38 


53 


32 


18 


7 


1 


9 


11 


3 


7 


12 


20 


31 


A6 


38. 


22 


9 . 


-2_ 


10 - 


12 


2 


A 


8 


15 


25 


39 


AA 


26 


11 


2 


11 


13 


1 


3 


6 


11 


20 


33 


1 50 


31 


13 


2 


12 


15 


2 


A 


9 


17 


30 


A6 


1 35. 


18 


6 




17 


20 




1 


2 


A 


11 


23 


59 


35 


13 


2 


19 


22 






1 


3 


7 


16 


67 


A2 


17 


2 



Specified Criterion Level ^85 



Student's True Level of Functioning* 



Advancement 
Score 


No. of 

Tegt Items 


50 


55 


60 


65 


70 


75 


80 ' 


85 


90 


95 


- <i-7 


8 


A 


7 


11 


17 


26 


37 


50 1 


3A 


19 


6 


■"v8 


9 


2 


A 


7 


12 


20 


30 


AA ' 


AO 


23 


7 . 


9 


10 


1 


2 


5 


9 


15 


2A 


38 1 


A6 


26, 


9 


10 


11 


"1 


i 


3 


6 


11 


20 


32 1 


51 


30 


10 


11 


12 




1 


2 


A 


9 


16 


28 1 


56 


3A 


12 


17 


19 






1 


c 


5 


11 


2A 1 


56 


29 


7 


19 


21 








1 


3 


8 


18 ' 


63 


35 


8 



*The true level of, functioning is the percent of items a student 
would be able to ans\/er correctly if he were given the entire universe 
of items. 

Students having true level of ■ functioning values less than the specified 
criterion level should fall a test composed of all i^em6_frgm_thls»ainiverse-r 
However, on any given_tej3t_of ^IniteHlengthr some~~of these students xd.ll gat 
-^ore'^thmTTfie mi^ adve^ncement percent of the items correct and be 
considered as "passers". The expected percent of such incorrect advancements 
are given In the body of the table to the left of the dotted line. 

Students having true level of functioning values equal to or greater 
than the mlninum advancement percent should pass such a test. The percent 
of these students who will be incorrectly retained are shown in the table 
to the right of the dotted line. 



tables, Millman^ obtained the probability that a student with a given 
true level of functioning would be incorrectly advanced or retained by 

this procedure* . _ - 

Table i expands on some of Millman's computations and gives the 
conditional probability of incorrect advancement or retention for a variety 
of true levels, test lengths, and minimum passing percentages. The first 
impression this table provides is that a substantial proportion (sometimes 
more than *haif ) of the students with true levels close to j or -at the 
criterion level, will be incorrectly advanced or retained, at least for 
the test lengths considered. There appears to be a slight improvement 
in accuracy of decision as the test length increases from 8 to 22 items, 
although this effect is largely hidden by fluctuation in the probabilities, . 
due to changes in the percentage correct required for advancement. For exampl 
with a criterion level of .7, the percentage correct required for advancement 
is .75, .78, .70, .73, or .75 for test lengths of 8, 9, 10, 11, or 12 items, 
respectively. This brings up a question as to the optimality of the decision 
procedure assumed in Table 1. To provide a framework for answering this 
question, let us consider some of the issues involved; 

Suppose seven out of eight were taken as the minimum advancement score 
when the criterion level is .75; the probability of incorrect advancement 

would decrease substantially for aU student^^^ 

"the^rite^loiTlevel. This is shown in Table 2. On the other hand, 
those above .75 suffer a substantial increase in their chances of being 
incorrectly retained. Apparently, a more general framework is required 
before even the decision procedure can be chosen, much less any judgment 
made concerning minimum test length. This framework would need to take . 
into account on which side of .75 small expected errors were considered^ 
to-be more important. 
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^ Table 2 

Percent of Students Expected To Be Incorre ctly 
Advanced or Retai ned 
Criterion Xevei = . 75 Test Lengfff'^ 8 

True Level 



Advancement 
Score 


50 


55 


60 


65 


70 j 


75 


"80 


85 


90 


95 


6 


15 


22 


32 


43 


55 j 


32 


20 


11 


4 


1 


7 


4 


7 


11 


11 


.26 j 


63 


50 


34 


19 


6 



A Framework For Specifying Test Length 
Table I'is very helpful in identifying the seriousness of the pro')lem 
of short tests. From a practical .point of view, however, a solution to the 
problem must involve looking at a different conditional probability, and 
abandoning the simple decision procedure that Millman has so convincingly 
demonstrated to be inadequate. Instead of the probability that a student 
will attain a particular test score, given his true level, it is the 
probability that a student's true level of functioning exceeds the specified 
criterion level, given his test Jicore, which is required in making a decision. 
In other words^ it is the test ^core—not the true level— which is given 
(l^jB^^obseryed^ 

retain the student. Thus, a student should be advanced only if the probability 
that he has attained or surpassed the criterion level, given his test score, 
is sufficiently high. To obtain the necessary probability, an application 
.oJE Bayes .theorem. is.-requlred»— -In- such an* analysl'S^, prior knowledge 
(expressed in probabilistic terms) of the student's true level of functioning 
is combined with the (binomial) model information relating the observed 
test score to true level; and,, the result is a posterior probability 

8 • ' . 



distribution for true level of functioning,, giyen, test scorev HKe^ " 

probability this distribution assigns to levels above the criterion 

is the quantity of interest. In thisd, formulation, the problem' ^c'an be 

described as selecting a minimum sample size and an advancen^ent score, so 

th^t students attaining that score will then have a sufficiently high 

probability of having at least the minimum required level of functioning. 

/ As a first approximation, let us suppose our laiowledge of a student's 

true level of functioning is vague, prior to having his test results. 

If this state of knowledge is characterized by selecting a uniform 

distribution on the interval from zero to unity for tri^e level, tt, Bayes 

theorem provides the posterior probabilities listed in Table 3 for ^various 

scores and test lengths. The posterior distributions on which these 

probabilities are based all belong to the Beta family, and the parameters 

in each case are those given in the table, primarily for future .ref erence^ 

« 

To generate a decision procedure on the be^sis of Table 3, we 
must select a criterion leve? (tt^) and a minimum acceptable probabilicy 
„ t hat_ a^tiiden t's true level (ir) exceeds t his criterion. Thus, fo r examp 1 e , 
we might take tt «= .80 and the minimum acceptable Prob(TT > tt |x, n) « .50, 
where x is test score and n is"' test length. W^would then be saying that 
"we wanted to~ advance the student only if we were at least 50% 
sure that his level of functioning Was above .80. Then, using Table 3, 
we see that with n = 8, all students having x >^ 7 would advance to the 
next objective, but not those with xj= 6."'^_ Fojr a^test^of -12 items-, the 
minimum advancement score would be J.O correct. ^ 

Note, however, that if we required 80% assurance that the true level 
of functioning was above .80, [ProbCir >^ .80) >^ .80], then even those with 
eleven correct responses to twelve items would not be. advanced^.. We think 
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Table 3 






















Probability Student's True Level Of Functioning Is* 
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- > 




Greater Than n Given A Uoifcna Prior Distribution 
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Minimum 








Criterion Level — k 








Mvancement 


No. 'of ( 


Posterior 
















• 




Score ^ 


T Q o T am c 
XeSU JLuculb 


uisuriDut. ion 


SO 55 


60 


65 


70 


75 


ou 


0 J 


OA 


95 




•6. 


8 


6(7, 3) 


91 85 


77 


6'6 


54 


40 


26 


14 


5 


1 ' ■ . 


1 


7 


8 


3(8, 2) 


98 96 


93 


88 


80 - 


70 


56 


40 


23 


7 




8 


8 


B(9, i) 


100 100 


99 


98 


96 


Vim 


87 


77 


61 


37 




7 


9 


6(8, 3) 


95 90 


83 


74 


62. 


47 


32 


18 


7 


1 




8 


9 


B(9, 2) 


99 98 


95 


91 


85 


76 


62 


46- 


26 


9 




9 


9 


6(10, 1) 


100 100 


99 


99 


97 


94 


89 


80" 


65 


40 ' 




7 


10 


6(8, A) 


89 81 


70 


57 


43 


29 


16 


' 7 


2 




- 


8 


10 


6(9, 3) 


97 93v 

k ■ ^ 


88 


80 


69 


54 


38 


22 


9 


2 - . 




9 


10 


6(10, 2) 


\99 99 


97 


94 


89 


80 


68 


51 


30 


10 




8 


11 


6(9, A) , 


\ ° 
93 87 

\ 


77 . 


65 


51 


35 


-21 


9 


3 






9 


11 


6(10, 3) 


98\ 96 


92 


85 


75 


61 


44 


26 


11 


2 


— — 


10 


11 


6(11, 2) 


s 

100 99 


98 


96 


92 


84 


73 


56. 


34 


12 




9 


12 


6(10, A) 


95 91 


83 


72 


58 


42 


25 


12 


3 






10 


12 


6(11, 3) 


99 97 


94 


89 


80 


67 


SO 


31 


13 


2 




11 


12 


e(12, 2) 


100 100 


99 

<• 


97 


94 


87 


77 


60 


38 


14 
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• 






' 10 
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that it is unreasonable to require perfect performance as a standard for 
advancement, and. therefore, ve need to improve upon this analysis, .One 
way is to use a longer te8t> but we can, at least, hope to find a procedure 
In which a l2-item'test will be adequate. " ^ 

The results in Table 3, although ithey provide relevant information 
for mastery decisions about students based on test scores, do not • - ^ 
take full advantage of the power which is available through the use 
of prior knowledge. In particular, it will seldom be the case that our 
^knowledge of a student's true level is adequately described by a uniform 
distribution. For example, our prior probability that a student is 
functioning above a criterion level of .8 might be approximately*' .75. 
This would be the case if historical data sug-^estecl that about 75% of 
the students who completed a unit of Ii^ividually Prescribed Instruction ^ 
proved to be at or above mastery level. Moreov<ir, we might judge the 
strength of our ' knowledge to be roughly equivalent to that based on a 
score from a 12-item test. (A method for making this assesmeat will be 

referenced shortly.) 

When working vith'a binomial model, it is convenient and generally ^ 
very satisfactory to select a member of the Beta class of distributions to 
characterize prior beliefs (Novick and Jackson, 1974). If this is done; the 
posterior distribution is easily obtained, and in every instance will again 

-be' a-member-^of .the^JSeta^ fajmil:^ In fact, if the prior distribution ic 

B(a, b*) and x succes? in n trials are observedT^TheiTtl^^ distri- 
bution is 6(x + a, n - x + b). This can be seen in Table 3, where it is 
noted that the uniform distribution is 6(1, 1). If we restrict ourselves 
to prior distributions*^ in the Beta family, the beliefs specified in the 

previous pafSgraph are characterized by 6(10.254, 1.746). Given this prior 

ft 
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distribution and\the Indicated test results, the posterior distributions 

and ^.osterior probabilities of^ exceeding various criteria are provided Iti 

Table 4. T^ie precise stipulation of prior distributions must always! be 
V • • • y 

don,e carefully, but extensive aids (Novick and Jackson, 1974 ^ NovickJ 
^ Lewis, and Jacksoxr, 1973) are availabiej^ and indeed an elaborate system 
of Computer Assisted Data Analysis (CADA)- .is available (Novick, 1973) to 
help an instructional decision maker specify his prior distribution. A yet 

) 

more sophisticated -.Way of getting prior and posterior distributions for 

each person is derived by Lewis, Wang» and Novick (1973; and the required 

• tables are given by Wang (1973). For the present, we- shall suppose that 

this work has been done carerully and that the prior distribution used ''.n 

* / 
the construction of Table 4 is appropriate. 

' .Tables 3 and 4 demonstrate clearly; the Impact of prior knowledge • 

on our interpretation of test resuj-ts. In Table 3, for example, the 

posterior probability that a student with a Qpore^of six out of eight 

items correct has a true level greater than .80 is only .26, whereas 

in Table 4 this ^probability has increased to .60. This result should not 

be surprising, in view of the fact that we have now set this probability 

to be .^5, apriori as compared to .20 in Table 3. If we felt the chances 

\o be v4r^ good^ that the student had mastered an objective- (tc a level above ^ ^ 

.8)' before we saw the test results, then a so^ine of six out of eight will- . 

' not substantially change our beliefs; it will lower the probability, but 

aposteriori may still leave the odds in favor of mastery. In ^ny^^ , 

applications, a prior probability of mastery m-y beXxa more than .60^ but 

tlie results will st^.11 differ sharply from thosfe obtained, . assuming^ vague 

prior information. Note that if we were to adept the rule that we will 

advance a §tudent if f he aposteriori probabilitly of mastery is at least \ 
■ / ' . 

12 ^ * t 
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Table A 

Probability^ Stud'e^t's True Level of Functioning Is 
Greater Than tt^ Given A S(10>25A, 1>7A6) Prior Distribution 



Minimum 

Advancement No. of Posterior 
Score Test Items Distribution 



Criterion Level — ir 



50 55 60 65 70 75 80 85 90 95 



:6 
7 
8 
7 
8 
9 
7 
8 

8 

9 
10 
9 

■10 
11 



8 
8 
8 
9 
9 
9 
10 
10 
10 

11 
11 
11 

12 
12 
12 



6(16.254, 3.746)100 100 98 96 90 78 

e (17. 254, 2.746)100 100 100 99 97 92 

6(18.254, 1.746)100 100 100 100 99 98 

6(17.254, 3.746)100 100 99 97 92 82 

6(18.254, 2.746)100 100 100 99 98 93 

6(19.254, 1.746)|lOO 100 100 100 100 98 

6(17.254, 4.746)100 99 97 93 84 68 

6(18.254, 3.746)100 100 99 98 93 84 



60 37 15 

81 62 36 

94 85 66 

65 41 17 

\ 

84^X^6 39 

95 87 69 
'47. 24 7 

68 45 19 



2 
10 
32 

2 
11 
34 

1 

3 



6(19.254, 2,746)10a 100 

6(18.254, 4.746)100 99 

6(19.254, 3.746)|lOO 100 

6(20.254, 2.746)100 100 

6(19.254, 4.74"6)100 100 

6.(20.254, 3.746)100 100 



6(21.254, ^.746) 



100 100 



100 99 
98 94 
100 98 
100 100 
9S^ 96 
100 99- 
100 ibo 



98 95 
87 72 
95 87 

99 96 
89 76 
•96 89 
99 96 



86 69 42 

51 27 , 8 

72 48 22 

88 72 -45 

55 30 10 

1 

75 52 24. 

90 75 48 



12 

L 

^ 1 

^3 
13 
1 
A 

lA 



Note: The mean and mode, respectively of 6(10. 25A, 1.7A6) are 

.855 and .925 and for this distribution Prob(7r > tt ) for tt = .70, .75, 

' o o 

• 8Q»/_»J5 ,are ...92,^.56,. .7.5,. and. '.59.,^respectJLvely.^ A close look at these 
distributional characteristics will help a decision maker determine if 
.this prior distribution is a realistic characterization of his beliefs. 
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J ' - 

.50, then in this examp^le, we will advance him if the prior distribution 

were that of Table 4, but not if it were that of Table 3. 

When the decision maker specifies -an informative prior distribution, 
he is saying, in effect, that he* wants a decision which will have a high 
probability of being correct in that portion of the decision space in which 
he thinks the student's ability truly lies. For example, referring to 
Taole 2, a decision maker^^with a high prior probability that the student had 
a true level of functioning below .75 would, by virtue of his analysis, 
require a minimum passing score of seven correct out of eight items. This 
would assure him a low probability of misclassification for all values 
below .75. Another decision maker with high prior probability that the 
student was above criterion level would ^ykely_requlre_onLy^si^^ 
ei ght cor rect , and thus hav e_k>J^PXoJbabllity^ot^an^incor-rect-deei8ion--for- 
values of .75 or above. 

Once we have decided to work with the posterior probability that a 
student's level of functioning exceeds some criterion, given his test 
BOO re, and have made use of our prior knowledge in obtaining this 
prpbability, another issue remains to be settled before we-.can turn 

0 

to the question of test length. Simply stated, we need to- know how sure 
we should, be that a student has mastered an objective at the chosen level 
before we make the decision to allow him to advance to the^ next objective. 
For instance, is a posterior probability of • at least .5, as was used in 
the last example, a reasonable choice 'in all casc3? Almost certainly 
this last question should be answered in the negative. The point at 
issue here comes down to an understanding of the relative disutilities or 
losses associated with the false positive and false negative errors. 
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If it were no more serious to advance a student whose level was below 
the criterion than to retain a student who was above, we would be behaving 
optiihally if we were to advance students with posterior probabilities above 
.5 and retain the others. In many situations the prior probability will be 
this high, and hence an advancement decision could then be made on an apriori 
basis. On the other hand, we might consider the loss to be twice as great 
for a false advancement than for a false retention. In this case, we should 
only advance those students whose posterior probability for being above the 
criterion exceeds 2/3. The general result is that^ we shall achieve the 
smallest expected loss if we match the posterior odds to the loss ratio. 
JThus,.. -it the loss ratio is 2 to 1 (false advance to false retain) , a 



.probability-of 27-GZ -♦--I) glves"TnarchinTg^dd3s^F'2*/3' to 1/3 above criterion to 
"below criterion). 

Table 5 

Losses Associated With Incorrect DeclfllonR 



IT > IT 
— o 



True Level 

IT < IT 



Decision 



Advance 



Retain 



To express the result symbolically, consider the notation of Table 5. 

Here- a is the^loss associated with^ advancing a student wliose true level is 

below T^^f and b^ is the loss for retaining a student whose true level exceeds 

TT . The decision rule which minimizes expected loss in this situation is 
o 



er|c 
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to advance a student if his test score is such that 

b Prob(7T >^ t^qI^* n) >^ a Prob(7T < ir^lx, n) , 

and to retain him otherwise. This comparison is equivalent to comparing 

the loss ratio a/b to the probability ratio Prob(7r >.7t^|x, n)/Prob(7T < 7t^|x, 
If a - b in our analysis, tTie decision procedure .reduces to comparing 

the median of the posterior distribution with the specified criterion 

level. If the median is at least at this level, the student is' advanced, 

otherwise he is retained. In this situation, the decision" procedure is 

very similar to tha£ used by Millman (1972). Though the procedure used by 

Millman Is not Bayesian, it is equivalent to comparing ''with the mgde (rather 

~ \ ■ 

than the medlan).„oJLjt.he^posterd'or--dist-rHibut-ion^ prior . 



Thus, in effect, the sampling theory approach gives equal weiglit to all 

equal intervals throughout the range of tt; that is, effectively, to take it t 

f 

be uniformly distributed apr^ibri. This is seldom a reasonable prior 
specification. We might also remark that the formulation in Table 5 can be 

generalized to provide for differential utilities for correctly identifying 
true positives and true negatives as well as -differential disutilities 
(or losses) for false positives and false negatives as is done in Table 5. 
To do this negative quantities (negative disutilities » utilities) would 
■need to replace the zeros in Table 5, and a slightly more complicated 
analysis would be used. 

It may be worthwhile to summarize ^e ■situation- at -this .point. An 

instructor wishing to use test results in the context of Individually 

Prescribed Instruction should be ready to supply three kinds information. 
First, a criterion level— the minimum degree of mastery required— must be 
set. In Individually Prescribed Instruction this seems to run f-rom about 
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770 to about .85. Second, prior knowledge of the student's true level of 
functioning must be translated into probability terms, namely a, prior 
probability -distribut?i9n for 7t Typically, a carefully monitored program 
will be such as to, suggest a prior probability distribution that assigns a 
probability of just more than .50 to the region above the criterion level. 
If this is not the case, the general efficacy of the program should be 
re-evaluated. A program that resul.ts in a much higher probability may be 
wastefully long and one that results in a lower^ probability may require 
8 tr en gt hen in g .__EiiialLy-^ -the- r ela tlve~"ll? s s e s associated with the two types 
of incorrect decisions must be assessed. A ratio of more than 1/1 is the 
rule (we are told) with ratios of 1.5/1 and 2/1 beihg common, and ratios 
as high as 3/1 not being rare. 

It should be clear that all three of the above determinations will 
have an influence on the minimum necessary test length. As the criterion 
level approaches unity, the test must be longer in order tb provide adequat 
inforjnation about a student's level of functioning in the neighborhood 
of the criterion. If prior probabilities of mastery are sufficiently high, 
very short tests become possible, but this is not and should not be the 
typical case. Finally, higher loss ratios require longer tests ta^siUow 
the possibility of high posterior probability of mastery. We shaA also 

see that greater test ^lengths are sometime^s required because *qf the obvious 

^ - ~ .._ 

restriction to integer valued sample sizes • 

♦ A Design For Test-Length Specification 

The characteristics 6f the group of students 'being tested must now 
be considered as they relate to test-length specification. Each member 
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Table 6 

Selected Prior Distributions For IPX Advancement Decisions 



Effective 
Prior Prior ' 

No. Distribution Sample Size Mean 



Prob(7i^ ±1^^) ■ 
.00-. 70 ..70--.-75 i75^.80 .80-. 85 .85-. 90 .90-1.00 




8 


.70 


.46 


.12 


.12 


.12 . 


• 


, 8 


.75 


.33 


.12 


.13 


.14 


• 


8 


.80 


.21 


.10 


_a2_ 






—8" 
8 ^ 


.85 
.90 


.12 

.,05 


.07 ~ 

;04 


.09 
.06 


■ .13 
.09 


• 
• 



.08 
.15 
T2-f 
.42 
.62 



^6 


3(7, 3) 


10 


.70 


.46 


.14 


■ -.14 


.12 


.09 


. .05 


7 


3(7.5, 2.5) 


10 


,.75^ 


.32 


" .13 


.15 


.15 


. 13\ 


.12 


8 


3(8, 2) 


10 


.80 


.20 


.10 


.14 


.16 


.17 


.23 


9, 


3(8.5, 1.5) 


10 


.85 


.10 


.07 * 


.10 


0 

.lA 


.19 


.40 


10 


3(9, 1) ' 


10 


.90 


.04 


.03 . 


.06 


.10 


,16 


\.6l 


11 


3(8.4, 3.6) 


12 


.70 


A7 


.15 


.15 ■ 


.12 


.08 


- .03 


12 


3(9, 3) 


12 


.75 


.32 


.14 


■ .16 


116 


.13 


" .09 


13 


3(9.6, 2.4) 


12 


.80 


.18 


.11 


.15 


.18 


.18 


.20 


14 


3(10.2, 1.8) 


12 


.85 


.09 


.07 


.11 


.16 


.20 


.37 


15 


3(10. 8, "1.2) 


12 


.90 


„ .03 


.03 


.06 


.11 


.17 


.60 ' 


16 


3(10.5, 4.5) 


15 


.70 


.47 


• 1^7 


jl16 


-12 - 


..06„. 


^...02. 


17 


3(11.25, 3.75) 


15 , 


.75 


.30 


.16 


.18 


.17 


..13 


.06 


'18 


' 3(12. 3) 


15 


_ .80 


.16 


.12 


.IT 


r20 


; 19-^ 


- .16.. 


19 


3(12.75, 2.25) 


15 


.85 


' .07 


.07 


.12 


.18 


.23 


.33 


20 


3(13.5, 1.5) 


15 


.90 


.02 


.03 


.06 


.11 


' .19 


.59 



*Note: All entries have been rounded 
row totals add to 1.00.< 



to two decimal places and smoothed so that the 
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of the group of students tested has beea.exposed co the same instruction 
program under identical local conditions. If a particular student is 
not considered" atypical for this groups then our prior beliefs about his 
true level of functioning should closely reflect the true distribution 
of levels ,of functioning found in that group. Indeed, elaborate formal ^ 
procedures for, effectively,*^ bootstrapping-'^a^prlor^^istr^bi^^ using, 

or ""'eacH'^Smimine^ remaining m - 1 examinees are 

described by Novick, Lewis, and Jackson (1973). Thus, group charafct eristics 

through their effect on our prior distributions, do, affect test-length 

— ^ " W 

specification. If the average test score of the group is high (i.e., 

above, the criterion level) and there is little variation among individuals, 

«" 

shorter tests become feasible. ' 

Since, in practice, prior distributions will be based upon on-site 
experience^ there will, of coursej, be different prior distributions 
for different sites. What we-shall attempt to do here is to show what 
sample sizes will be required for a broad range of prior distributions 
and loss ratios. What we need to do now, therefore, is to consider certain 
combinations of prior distributions, criterion leveT.s^ and loss ratios, 
and see what sample size will be adequate in each case. 

^Eorupur— analysesT^^e-^shalXcoiTs^^ different prior distributions 

for the level of fimctioningJL>_^^^ and'^fouf 
loss ratios. Foi^ each criterion level, we shall consider all foUr loss 
ratios and four of the prior distributions. The four loss ratios we 
shall^ use are 1 . 5, 2V0, 2.-5, anT 376^ The" respective prdBabilTties ~~ ' ^ " 
P ° Prob(7T > TT ) required for advancement, [given by setting P/ (1 - P) 
equkl to the loss ratios, a/b] are .60, .67, .71, and .75. Thus, with a 
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loss ratio of 3.0, the posterior probability that the student's level of 
ftinctioning is greater than the specified -criterion level must be at least. 
• 75, if he is to be advanced • 

The twenty prior probability distributions we shall be considering 
are given in Table 6 where they. haveJb^e(eu_Rroupj£d^n^blocks^f--£l^ 



each block having a distribution with the respective mean values •70, ♦75, 

•80, .85, and .90^ The blocks differ with respect to the concentration of 

the prior distributions. Within block, the distributions differ with . 

respect to their mean values. Note that in the first block the arguments 

of each Beta distribution sum to 8, e.g., 5.6 + 2 A = 8^ This indicates 
that the amount of prior information contained in each of these distributions 

is equivalent .to what would be gained from a test containing eight items. ' If 
given one of, these prior distributions and some ^cr iter ion level ax^d loss ratio 
we specify, an eight-item test, our posterior distribution will qontain 
information equivalent to that contained in 16 observations. This. 'contrasts. . 
with the classical procedure which uses no prior information. It is this 

increment in- inf6rmation that is equivale:it jco_ prlorjob^erv^^ 

permit s ia 'reduction In .test^-length- when a Bayesian. procedure Xfirus^^^^ 

The first problem 'in doing an 'analysis is that of selecting a reasonable 
prior distribution." For the present application, we would first need to 
ask ourselves Jdiat we Wd .expect -to find as thennean lWeT of functio^^^ 
in our pdsttest group. With a specified criterion level_pf_iZO, we might— 
hope for a mean level of functioning of .70. Thus, we would have people in 
training until such time as we would "expect" them to be Jiualified. Since 
loss xatios are -typically greater .than-one,-^some overtraining. .may be thought 
to be useful, but as we shall see, excessive overtraining may be wasteful. 
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Suppose, for concreteness, that we believe the mean population level 
of ^functioning to be .70. Distributions 1, 6, 11, and 16 satisfy this 
condition, and, hence, we may choose from among these. We note that 
these distributions are in an increasing order of tightness, as may most 
conveniently be seen in the probability assignment given in the last column^ 



to the interval (.90, 1.00). These probabilities are respectively .08, 
.05, .03, and .02. We need to ask ourselves which of these values seems 
most reasonable, and this then will give us some preference among these 
prior distributions. We might consider the relative weight of prior 
information assumed by each prior distribution (8, 10, 12, and 15 equivalent 
prior observations, respectively), and this should help to narrow our 
focus to one or two adjacent prior distributions for this, or any other 
application. Since the authors of this paper cannot know what an appro-' 
priate prior distribution will be in applications they have not .seen, 
it will be most helpful, we think, to work out sample size allocations 
fcrjBeY^aLpxior di£tribu^^ .and^ leave-t-he"fina:r~seIS*tSo^^^ 



"in the field". We Jbe^leye -that^ the--prior*^dist"ributiOT loss ratios, 
and specified 'criterion levels used here are typical" of those found in 
practice, 'arid, therefore, that the specific results we shall obtain will 



_he_u8eful. However^ if other combinations present themselves, we beliey,e 
that the^f*^en>ral methodology that we are demonstrating should be adequate 
to the problem. Actually we shall find that-most of ^ bur specifications 
^are very robust with respect to the choice of prior distribution within the 
range we have considered. 



Sbme^pecific Test Length Recommendations 
In Table 7, we giv^ Recommended sample sizes and .minimum-advancement 
scores for tt^ « .70, (a/6) =^1..^ ^.0, 2.5, 3.0 and prior distributions 
1, 6, 11, and 16. The values tha^ve have settled on for the body of 
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Prior ^ . 
Dist-r-ibut-ion— :^(-7r-) l.-5-(-> 60) 2-. O-C-. 67-)— -2-.-5-(..-7-l-)^ .3,-0_(-^_5). 



. Table 7 

Recommended, Sample Sizes and Advancement Scores 

IT «= .70 
o 

Loss Ratio 



6(5.6, 2Ar . (.70) 6/8(.62) 

6(7, 3) (.70) 6/8(.6l) 

6(ff.A, 3.6) (.-70) 6/8 (.61) 

6(10.5, 4.5) (.70)''. 9/12 (.62)' 



10/13 (-.70) 11/1A( . 74) ^ 12/15 ( . 78) 

10/13 (.69) 11/14(773) 12/15 (.77) 

107l3(.68) 11/14(.72) 12/15(.7&) 

10/13 (.67) 11/14 (.71) 12715 (.75) 



6/8(75%) 



General Recommendations 
10/13(77%) 11/14^79%) 



J. 



,12/15(80%) 



Apriori, Prob(Tr >_ .70) for each of the four prior distributions is_ 
.^.4,. .54, .53r and.. 53. .. - - - " 



2 



For 6/8, Pi^ob(Tr >, .70) = .598. 
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l:hl8>>,tabXe are not«_in every instance, optimum in any statistical sense, 
though we are confident that the risks associated vith these decision rules 
are in > every case insignificantly different from the risks of the optimum 
.procedures'^ — ^In ' 'sSie ctiiig-va-lues-fpr— thls^tableu\fe_have_gou 
sizes. and minimum advancement scores that would be very efficient over 
a widecrarvge< of prior distributions. That we have been successful in this 
endeavor is co^ifirmed by our ability ,tP give general recpmmendatibns 
that hold throughout the range of prior distributions studied. Actually* in 
only one instance have we cheated (see Footnqte 2, Table 7), but again 
the Increase in expected loss will be trivial. We would also note^that 
:the required percentage correct and the number of required observations - 
increases as the loss ratio increases, which "makes sense" on intuitive 
grounds* 

A rough indication of the near optimality of any of the individual 

specifications can be gained from the closeness of the aposteriori ^ - 

probaby.ity_ (indicated in- -parentheses f6Ilo\^g the specification) with 

the value required by the particular loss ratio (givdtr in parentheses 

at the top of the column), Thus, with the prior distribution 6(7, 3), the' 

decision rule "six out of eight", abbreviated 6/8,, leads to the aposteriori 

distribution 6(13, 5) arid to Prob(TT > .70) = .61 which is just .01 greater 

than the required level .60 for the loss ratio 1.5 (1.5 to 1). . In this 

instance, the specified decision rule may be very good. On the other. 

hand, consider the prior distribution 6(5.6, 2.4). Here the rule 11/14 

* 

leads to a value .74 when only .71 is required for a 2;5 to 1 loss ratio. 



Actually, the specification 8/10, is somewhat better giving a posterior 
probability of •729, Also for the prior distribution 6(7, 3), the posterior 
probability with 8/10 is •TisT^Witli the loss ratio-2 

prior 3(5.6, 2.4), the rule 7/9 laads to the posterior probability as 
compared to desired value of •67. In every case where we have specified 
an "almost best" decision rule, the result has b6en aiTlncfease ^in the 
specified sample size and <the purpose has been to obtain unifomity of 
specification over a reasonably wide range of amounts of prior. information. 
Considering our general ignorance concerning-^what might be an appropriate 
:prior distribution in specific applicat;^6ns, the specifications we have,^ 
given should be the more generally useful. 

' Another indication of "^how good a particular specification is can be 
J^nrerred^ the closeness., of the percentage correct required by the 
advancement rule to the specified criterion level. Clearly, if the 
percentage required by the advancement rule is very much larger than the 
specHied criterion level, a large percentage of qualified students will 
be retained and this is undesirable, particularly for small loss^ratios. ^ 
For large loss ratios, this is less Important and hence higher advancement 
ratios can, and, will need to be tolerated. This feature is exhibited in 
Table 7, where the advancement ratios increase with increasing loss ratios. 
One can, of course, keep the advancement ratio dqwn vex^y close to the 
specified criterion level even for higher loss ratios, but only by having much 
larger sample sizes. For example with the prior distribution 6(5.6, 2.4) 
the specified criterion level 7f^ « .70 and the loss ratio 2.0, the advancement 
ratio 72/100 is satisfactory since^Prob (it > .70|72/100) « ,675, but 
the indicated sample size is unacceptable. 
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Table 8 

Recommended Sample Sizes and Advancement Scores 

ir a . 75 



^Prior*, 
Dlsfrib\itipn 

3(6, 2)^ 



) 



1.5 (.60) 



Loss Ratio 
2.0 (.67) 2.5 (.71) 



3.0 (.75) 



3(7:^5, 2.5) 



(.75) 
(.75) 

P(?.:Wlt^ '^"^^ (.75) 
3(11.25, 3.75) (.75) 



8/10(.65) 
8/10(.6A) 
8/10(.63) 



16/20 (.70) 
16/20 (.69) 
16/20(.69) 



1/I0(x62 ). - 16/2 0'( . 68) 
-8/10(80%-)— 



17/21(.74) 
17/21(.73) 
17/21 (.72) 
17/21(.71) 



General Recoianendatidns 
16/20 (80%). _ .17/21 (81%) 



18/22 (.77) 
18/22 (.76) 
18/22 (..75) 



18/22(82%) 



^Apriori, Prob(Tr V .75) = .56, .55, .55, and .54, respectively, for the 
four prior distributions used in Table 8." 

^For 18/22, Prob(Tr > .75) «, .744. 

Table 9 " " " 



6 


Reconxnended Sample Sizes 


and Advancement Scores 










ir «= .80 
o 






Prior 
Distribution 


.( (ir) 


1.5 (.60) 


Loss 
2.0 (.67) 


Ratio 

2.5 (.71) 


V 

3.0 4.-75) 


3(6.4, 1.6)^ 


(.80) 


6/7(.66) 


7/8 (.70) 


17/20(.72) 


19/22 (.78) 


3(8, 2) 


(.80) 


..-6/7 (.65)' 


7/8(.69) 


17/20(.72) 




3(9.6, 2,4) 


(.80) 


6/7(.64) 


7/8(.68) 


« 17/20(.71) 


19/22 (.76) 


3(12, 3) 


(.80) 


6/7(.63) 


7/8(.67) 


^ 18/21(.73)^ 


19/22 (.75) 








General Recommendations 


< 






6/7(86%) 


7/8(83%) 


17/20(85%) 


19/22(86%) 



••■Apriori, Prob(Tr >_ .80) = .57; for 8/10, Prob(Tr > .80) = .55; for 16/20, 
■Prob(iT > .80) = .54; for 8.5/10, Prob(Tr >^ .80) = .67; for 8.3/10, " - 
Prob(iT .80) = .62; for 9/10, Prob(Tr > .80) = .78. , ^: " 
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^For 17/20, Prob(Tr > .8.0) » .70. 



Note that for each of the prior probabilitieis used in Tabie 7^ 

Prob(Tr >^ .70) > .SO* Thus, on an apriori basis > advancement '^'ould be i 

Indicated with a i^ss ratio l.O. This viil |j,enexally be true for the prior 

distribiitions we shall be adopting for our analyses'. Tfhe point is i:hat 

loss ratios of 1.0 are not .(we are told)^typical of IPI applications, and 

• • 

i^ test lengths are to be kept reasonabTe it -will be necessary to use 

training programs that give^ mean output' at or above the criterion leVel* ^ 

There has been a definite tendency in IPI to require relatively high 

advancement ratios; typically, the value ;S5 is used* One might well ' 

speculate whether this is a function of a high loss ratio v'^.ombined with 

a desire for a short test length, or whether it really reflects a perceived 

need for a high criterion level. (For example an advancement ratio of 6/7 

with the i^rior distribution 3(5.6, .2.4) would yield with ^ «= 6 a posterior 

Pro]>(TT > .70) « .77 which would be just right with a loss ratio of 3.0.) 

^The authors of this paper do not know the answer to this question, but hope 

that those within IPI will want to consider it carefully. Only through 

such serious consideration can the test length problem be "solved" • 

Some recommended test lengths for ir^ » .75 and four prior distributions 

with £?(tt) ^75 are given in Table 8. Again we have been able .to specify 

one generally satisfactory advancement ratio for'^each of the four loss 

ratios. We note that the required test lengths for ir^ « #75 are rather 

larger thavi for tt^ = .70; In Table 8> we find very short required test 

lengths for a 1.5 loss ratio and rather long ones for loss ratios of 2.0, 

2.5, and 3.0. 

In Table 9, we provide recommendations for ° •80 when C<?(tt) * .80. 
The results here parallel those of Table 8, except that the advancement 
ratios are very high as compared to the criterion levels." This is 
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r.§latSLvely unsatisfactory. In Footnote 1 to Table 9, we indicate the formal 
results for the prior kst ribution 6(6.4, 1.6) and the sample result 
y8,5'' correct and Vl.5" Incorrect ^nd also for "8.3" correct and "1.7" 
Incorredi^ '\fte3e provide very nice ^results for loss ratios of 2,0 and 1.5, 
respectively. UnfortUnatelx,. these are unobtainable sample results. This 
demonstrates that in t)art, large required t^st lengths may sometimes be 
due to the discreteness-, .an^d hence, discontinuity of our- possible experi- 
mental outcomes-. This also suggests that .the precise specification of t^e 
.advancement rules may be highly - s^nsitfvfe to the me*an value of the^rior 
distribution even if it is cp roving to be relatively insensitive -to the 



•total amount of information contained in the prior distribution, which is 
indicated by the sum, of the two parameters of t^ie Beta distribution. 

example, given the prior distribution (J(6.A, 1.6) and the 
Impossible -sample result x = 8.3^ n « 10*, we have the posterior distri-^ 
bution 6(14.7/ 3.3) which, as we indicated previously^gives^ 



Prol?(TT, > .80) = .62 which suggests that the advanc^ent ratio 8.3/10 
inigh? be very favorable j/ith a loss ratio of l.*5. , But suppose we had 
jufi^^a slightly different prior distribution^ namely, 6(6.7, 1.3) with 

(tt) « .84», then the sample result x 8, n = 10 would yield the posterior 
distribution 6(14.7, 3.3) and thus, forjtihe reasons given above, indicate 
that the advancement ratio 8/10^ might be attractive. This advancement 
ratio is clearly more attractive than the ratio 6/7. despite .tKe fact that 
it requires, three additional items, because this ratio r3/10 80% is closer 
.to the criterion* level than is the advancement r^tlo 6/7 86%. 

Because of this *relati>ply high dependence of the results oa the 
expected value of the prior distribution,, it seems important "tp attempt 
some study of the variation ;of our results as a function of changes in 
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Table 10 

Recommende'd' -Sample Sizes and Advancement. Scores 
;■ :■ " iT„ = .80 V ■ ■ 



Prior 
Distribution 




r.5 (.60) 


Loss 
2.0 (.67) 


Ratio 

2.5 (.71) 


3.0 X.75) 


3(6.8, 1.2)^ 


■(.85) 


'8/10 (.64) 


9/ll(.69) 


10/12 (. 72).-'- 


11/13 (.76) 


B(8.5, 1.5) 


(.85) 


8/10 (V 66) 


9/11 (.70) 


10/12 (.73)^ 


11/13 (...76) 


3(10.2, 1.8) 


(.85) 


8/;L0(.67) 


9/ll(.7l) 


9/ll(.71)^ 


11/13 (.77) 


3(12.75, 2.;25) 


.(.85) ■ 


, ,8/l6(.69) 


9/ll(.72) 


9/ll(.72)^ 


li/13(.78) 








General Reconmendations 








8/10(80%) 


9/11(82%) 


10/12(83%) 


11/13(85%) 

i 


■^JFor 5/6, 


Prob(Tr >_ 


.80) = .72. 









'^For 5/6, Prob(iT >_ .80) = ,73. 
hoT 10/12, Prob(iT>> .80) = .74. 
^For) 10/12, Prob'(iT > .-80) = .75. 



'^For the four prior distributions, the ajpriori probabilities of ir > .80 
. are .72, .73, .74, and'. 75. With these prior distributions and yith 7/10» 
the posterior probabilities of tt > .80 'are ..41, .43, .46, ai^d .48. 
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our prior distribution. For this reason, we have in^T-able 10 redone our 
sample size reconnnendations under the assumption that the mean^olf^ur 
prior distribution is .85 instead of .80. 

Surely the practitioner will find the sample size recommendations 
of Table 10 to be att;ractive. Apparently with these prior disbributions, 
test lengths need be no greater than 13 for any of the listed loss-ratios. 
With the prior distributions having ^(tt) = .80, a sample size of 22 is 
required St^hen the loss ratio is 3.0. 

What is happening is that we are beginning with fairlv strong beliefs 
that IT 2l so that not much data, in confirmation, is required even for 
high loss ratios. In fact, even Qn..an-apriori 'basis, an advancement 
decision would be made for all loss ratios up to and including 2.5. 
Indeed, we see tha,t the function of the sample data here is to provide 
the possibility of obv.aining some information that might change the 
decision to retention. For example, an observed performance ratio of 
10/13 with the prior distribution 6(6.8, 1.2) wbjild give aposteriorl^ 
Prob(iT >^ .80) = .72, and hence, the student wbtiid he retained if the 
loss ratio were 3.0 (see also Footnote 5^ Table 10). 

We believe that the comparison of the specifications in Tables 9 
and 10 haye important implications for IPI management. When loss ratios 
are high, it may well' be highly advantageous to strengthen the training 
program to the extent that the mean output is well above the specified 
criterion level; This will make it possible to use short tests or, 
alternatively will generally reauce the , risk of incorrect classification. 
This wili, of. course^ be more expensive, and this investment must be balanced 
out against the reduction in the cost of testing and the reduction in the 
expected loss du^ to incorrect decision. The final Table, Table 11, looks 
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< Table 11 

t 

Recomnended Sample Sizes and Advancement Scores 

IT « .85 



Loss Ratio 



Prior 
Distributions Q 


p (TT) 


1.5 (.60) 


2.0 (.67) 


2.5 (.70) 


3.0 (.75) 


;e(6.8, X.V)^ ' 




7/8(.62) 


- 9/10(.70) 


17/19(.73) 


18/20(.76)^ 


e(8.5, 1;5) 


(.85) 


7/8(.oZ) 


9/10(.69) 


17/19 (.72) 


19/21(.77) 


e(10*2, 1.8) 


(.85) 


7/8 (.61) 


9/10(.68) 


17/19(.72) 


19/21(.76) 


e(12.75, 2.25) 


(.85) 




_^9/-10(-.-670— 


17-/19(T7l7v~ 


"19721075)^ 


< 






General Reconmendations 






7/8(87.5%) 


9/10(90%) 


17/19(89%) 


19/21(90%) 



•4:he apriori probabilities for it > .85 are .59, .58, .58, and .•57. 
■For 10/11, Erob(Tr > .85 = .695). . " . 

Sor 19/21, ProbXiT > .85 = . 78) . ^ . • 



Very much like Table. 9 as far as test lengths are concerned. Here again 
some robust length assignments are obtained, though again, the lengths for 
the high loss ratios border on being discomforting. This can be corrected 
by trailing to an average level of functioning of •90* With the prior 
distributioh 3(7.2, 8), we find that ProbCir >^ ,85) = .76, aprlori. Observing 
6/7 yields ProbCu >^ .85) « .70, while 5/7 yields a value of .41. Observing 

8/9 yields .77, while 7/9^^yieX^^ " 

-iength'S''are again possible if the students are trained to a sufficiently 
high average standard". - ^ ' , 

; 

Some Summary Remarks 
The test length recommendations given in this paper are meant to be 
taken seriously and hopefully they will soon, be adopted on a provisional and 
experimental basis, so that more experience can be gained vhlle .some of 
tthe theoretical and substantive issues raised in this paper are debated. The 
questions of level of functioning required to define^ mastery and the 
relative losses incurred in making false positives and false negative decisions 
require serious discussion and concensus. We also need to get some clear 
picture of what kinds of distributions of outcomes are to be ejcpected as this 
^ determines the amount of prior information available in making individual 
assessments. This third issue is, as we have indicated, intimately related 
to the expected level of functioning that is sought in the group being trained. 
Hopeful and possible outcomes of such discussions could be a consensus that: 
'1. In most situations a^ level of functioning of something less than 
.85 is satisfactory. A value as' low"as 775 would be highly 
desirable. This could be accomplished by redefining .the task 
domain slightly to eliminate very easy items. 



2. Training should carefully monitored so that expected group 
performance will be just slightly higher than the specified 
criterion level. This will keep training time and testing time 
relatively low. 

3. The program should be structured so that very h igh loss rat ios are^ — 
-xiofTrppropr iate . That is to say, ilndividual modules should not 

be overly dependent on preceding ones. 

One problem that does not arise with Bayesian methods is any complication 

if sequential methods. &re used. Items "can simply be administered until 

it^is clear that a student will definitely^, or cannot possibly, attain the 

minimum advancement score. Thus with a minimum advancement score of 8/10, 

testing can cease as soon as light successes or t^ree failures are observed. 

Two issues have been treated in a rather gross way in this paper and 

on these important issues further research needs, to be done. First it 

,must be recognized that while the threshold loss^function we have adopted 

here is a better approximation to reality' than, for example, Livingstones 

criterion centered squared-error loss (see Hambleton and Novick, 1973), 

♦ 

it is only a gross approximation to be used while better and more complicated 
approximations are being investigated. Three that immediately come to mind 
are: 

1. A threshold loss function with an indifference region in which 
ther^ is zero loss for false positive or false negative errors. 

2. A. negative .squared-exponential loss used with the root arcsine 
transformation parameter 



Y = sin 
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3. A cumulative Beta r^.istribution loss function. 

m 

We expect that these lass functions wi ll give soiofi what^-ifrfereinr^ncl^rely 



-frertefTength specifications than those obtained here, but the overall 

decrease in expected loss may or may not be great. We should also remark 

that these recommendations are specifically made for first, time through 

decisions. We have yet, to consider the problem of decisions for students 

repeating a, unit. 

Finally, we would remark that one of the important issues that we 

identified at the* outset of this paper has b.een handled in a most casual 

and informal manner. To do other than this would have enormously complicated 

V 

the 'analysis and delayed substantially the appearance of our recommendations • 
We refer explicitly to the premium on testing ^ime within the instructional 
process and implicitly to an Implied trade-off between training and testing- 
time. A completely general analysis would consider an available tini'e T and 

an allocation of T into instruction and testing times i + t « T, so as to. • 

. J 

maximize a payoff function which would' have a , (possibly differential) positive 

I. ♦ ' 

payoff for each module successfully completed, and a (differential) negative 
payoff for an incorrect decision of either type. We are reluctant to undertake 
such a sophisticated analysis until such time as the operatin^g conditions 
of IPI are more clearly defined. 

For. the present paper we have implicitly adopted some' guidelines which 
effectively say that it is very desirable to have test lengths of 12 or 
less, tolerable but undesirable to haye test lengths as high as 20 and 
-discomforting to have tests that are longer than this. We have also taken 
the position that a decision should not be mad.e on the basis of prior and 
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collateral inforaation alone^ut^thsitL^ confirmed by a test 

that permits demonstration of nonmastery. As in all of the judgmental 
decisions made in this paper we have been guided by counsel from experienced 

1 - - ■ 

IPl personnel, particularly Richard Ferguson and Anthony Nitko to whom 
we are "much indebted. The value of this paper will largely^ be determined 
by the quality of the discussion engendered by it among, such people.. , 



mc . 
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